On Supervised On-Line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes
نویسندگان
چکیده
This note re-visits the rolling-horizon control approach to problem of Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from classical value-iteration approaches, we develop an asynchronous on-line algorithm based on policy iteration integrated a multi-policy improvement method switching. A sequence monotonically improving solutions forecast-horizon sub-MDP is generated by updating current solution only at currently visited state, building in effect for MDP over infinite horizon. Feedbacks “supervisors,” if available, can be also incorporated while updating. We focus convergence issue relation transition structure MDP. Either global optimal or local “locally-optimal” fixed-policy finite time achieved depending structure.
منابع مشابه
On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes
We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...
متن کاملInformation Relaxation Bounds for Infinite Horizon Markov Decision Processes
We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs a...
متن کاملAverage Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes
We consider a nonhomogeneous stochastic infinite horizon optimization problem whose objective is to minimize the overall average cost per-period of an infinite sequence of actions (average optimality). Optimal solutions to such problems will in general be non-stationary. Moreover, a solution which initially makes poor decisions, and then selects wisely thereafter, can be average optimal. Howeve...
متن کاملInfinite Horizon Discounted Cost Problems
• We often approximate a large number of periods, even if the horizon is known and finite, by assuming an infinite number of periods, and hope that this assumption will simplify the solution. Indeed, even if the general theory becomes more involved, the solution obtained often is simpler and has important computational and conceptual advantages: in particular, the optimal policy is often statio...
متن کاملSolving Infinite Horizon Discounted Markov Decision Process Problems for a Range of Discount Factors
In this paper we will assume the following framework. There is a finite state set I, with i E Z as its generic member, 1 < i < m. For each ie Z, there is a finite action set K(i), with k E K(i) as its generic member. For each i E Z, k E K(i), there is a transition probability, p(i, j, k), that if at a decision epoch the state is i E Z, and if action k E K(i) is taken, then the state will be Jo ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2023
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2023.3274791